Vous analyserez le jeu de données en réalisant un prétraitement des images et des descriptions des produits, une réduction de dimension, puis un clustering. Les résultats du clustering seront présentés sous la forme d’une représentation en deux dimensions à déterminer, qui ’illustrera le fait que les caractéristiques extraites permettent de regrouper des produits de même catégorie.
La représentation graphique vous aidera à convaincre Linda que cette approche de modélisation permettra bien de regrouper des produits de même catégorie.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#For ImageClassification
from skimage.io import imread
from IPython.display import Image
import os
import cv2
from sklearn.cluster import MiniBatchKMeans
from xgboost import XGBRFClassifier
from sklearn.metrics import accuracy_score
#for WordClassification
import re
import nltk
from nltk.tokenize import word_tokenize as wt
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
data = pd.read_csv('flipkart_com-ecommerce_sample_1050.csv')
print(data.shape)
display(data.head(2))
(1050, 15)
uniq_id | crawl_timestamp | product_url | product_name | product_category_tree | pid | retail_price | discounted_price | image | is_FK_Advantage_product | description | product_rating | overall_rating | brand | product_specifications | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 55b85ea15a1536d46b7190ad6fff8ce7 | 2016-04-30 03:22:56 +0000 | http://www.flipkart.com/elegance-polyester-mul... | Elegance Polyester Multicolor Abstract Eyelet ... | ["Home Furnishing >> Curtains & Accessories >>... | CRNEG7BKMFFYHQ8Z | 1899.0 | 899.0 | 55b85ea15a1536d46b7190ad6fff8ce7.jpg | False | Key Features of Elegance Polyester Multicolor ... | No rating available | No rating available | Elegance | {"product_specification"=>[{"key"=>"Brand", "v... |
1 | 7b72c92c2f6c40268628ec5f14c6d590 | 2016-04-30 03:22:56 +0000 | http://www.flipkart.com/sathiyas-cotton-bath-t... | Sathiyas Cotton Bath Towel | ["Baby Care >> Baby Bath & Skin >> Baby Bath T... | BTWEGFZHGBXPHZUH | 600.0 | 449.0 | 7b72c92c2f6c40268628ec5f14c6d590.jpg | False | Specifications of Sathiyas Cotton Bath Towel (... | No rating available | No rating available | Sathiyas | {"product_specification"=>[{"key"=>"Machine Wa... |
display(data.info())
display(data.describe(include='all'))
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1050 entries, 0 to 1049 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 uniq_id 1050 non-null object 1 crawl_timestamp 1050 non-null object 2 product_url 1050 non-null object 3 product_name 1050 non-null object 4 product_category_tree 1050 non-null object 5 pid 1050 non-null object 6 retail_price 1049 non-null float64 7 discounted_price 1049 non-null float64 8 image 1050 non-null object 9 is_FK_Advantage_product 1050 non-null bool 10 description 1050 non-null object 11 product_rating 1050 non-null object 12 overall_rating 1050 non-null object 13 brand 712 non-null object 14 product_specifications 1049 non-null object dtypes: bool(1), float64(2), object(12) memory usage: 116.0+ KB
None
uniq_id | crawl_timestamp | product_url | product_name | product_category_tree | pid | retail_price | discounted_price | image | is_FK_Advantage_product | description | product_rating | overall_rating | brand | product_specifications | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 1050 | 1050 | 1050 | 1050 | 1050 | 1050 | 1049.000000 | 1049.000000 | 1050 | 1050 | 1050 | 1050 | 1050 | 712 | 1049 |
unique | 1050 | 149 | 1050 | 1050 | 642 | 1050 | NaN | NaN | 1050 | 2 | 1050 | 27 | 27 | 490 | 984 |
top | aa82b75da7579007963e53b6f818281b | 2015-12-01 12:40:44 +0000 | http://www.flipkart.com/wallmantra-christ-rede... | Vitamins Solid Baby Girl's Basic Shorts | ["Home Furnishing >> Bed Linen >> Blankets, Qu... | NPYEYC9ZV2CUG9H3 | NaN | NaN | d136aa676ef52b09eab65762940957fe.jpg | False | arnavs Multi1 Bottle Opener Set Price: Rs. 250... | No rating available | No rating available | PRINT SHAPES | {"product_specification"=>[{"key"=>"Type", "va... |
freq | 1 | 150 | 1 | 1 | 56 | 1 | NaN | NaN | 1 | 993 | 1 | 889 | 889 | 11 | 22 |
mean | NaN | NaN | NaN | NaN | NaN | NaN | 2186.197331 | 1584.527169 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
std | NaN | NaN | NaN | NaN | NaN | NaN | 7639.229411 | 7475.099680 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
min | NaN | NaN | NaN | NaN | NaN | NaN | 35.000000 | 35.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
25% | NaN | NaN | NaN | NaN | NaN | NaN | 555.000000 | 340.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
50% | NaN | NaN | NaN | NaN | NaN | NaN | 999.000000 | 600.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
75% | NaN | NaN | NaN | NaN | NaN | NaN | 1999.000000 | 1199.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
max | NaN | NaN | NaN | NaN | NaN | NaN | 201000.000000 | 201000.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
#valeur manquante
data.isna().sum()
uniq_id 0 crawl_timestamp 0 product_url 0 product_name 0 product_category_tree 0 pid 0 retail_price 1 discounted_price 1 image 0 is_FK_Advantage_product 0 description 0 product_rating 0 overall_rating 0 brand 338 product_specifications 1 dtype: int64
data['retail_price'].fillna(data['retail_price'].mean(),inplace=True)
data['discounted_price'].fillna(data['discounted_price'].mean(),inplace=True)
data['brand'].fillna("",inplace=True)
data['product_specifications'].fillna("",inplace=True)
#valeur manquante
data.isna().sum()
uniq_id 0 crawl_timestamp 0 product_url 0 product_name 0 product_category_tree 0 pid 0 retail_price 0 discounted_price 0 image 0 is_FK_Advantage_product 0 description 0 product_rating 0 overall_rating 0 brand 0 product_specifications 0 dtype: int64
# Exemple de la catégorie du produit 0
print(data["product_category_tree"][0])
["Home Furnishing >> Curtains & Accessories >> Curtains >> Elegance Polyester Multicolor Abstract Eyelet Do..."]
# On récupère la colomne "product_category_tree"
list_categories_1=[]
list_categories_2 =[]
for txt in data["product_category_tree"] :
list_categories_1.append(txt.split(">>")[0].split("\"")[1].strip())
list_categories_2.append(txt.split(">>")[1].strip())
data["categories_niv1"] = pd.Series(list_categories_1)
data["categories_niv2"] = pd.Series(list_categories_2)
print(data["categories_niv1"].nunique()," catégories de 1er niveau")
print('----')
for i in data["categories_niv1"].unique():
print(i,len(data[data["categories_niv1"]==i]))
print("-> en moyenne nous avons ~150 images par catégorie au niveau de profondeur 1")
# Barplot
fig = plt.figure(1, figsize=(15, 5))
plt.subplot(1, 2, 1)
plt.title("Nombre de produits FlipKart dans chaque catégorie au niveau 1")
sns.barplot(data["categories_niv1"].value_counts().index, data["categories_niv1"].value_counts().values, orient="v")
plt.xticks(rotation=45)
plt.show()
7 catégories de 1er niveau ---- Home Furnishing 150 Baby Care 150 Watches 150 Home Decor & Festive Needs 150 Kitchen & Dining 150 Beauty and Personal Care 150 Computers 150 -> en moyenne nous avons ~150 images par catégorie au niveau de profondeur 1
print(data["categories_niv2"].nunique()," categories au 2ième niveau de catégories")
print('----')
print("-> Le nombre d'image par catégorie au niveau de profondeur 2 diffère énormément")
fig = plt.figure(1, figsize=(45, 10))
# Barplot
plt.subplot(1, 2, 1)
plt.title("Nombre de produits dans chaque catégorie au niveau 2")
sns.barplot(data["categories_niv2"].value_counts().index, data["categories_niv2"].value_counts().values, orient="v")
plt.xticks(rotation=60)
plt.show()
63 categories au 2ième niveau de catégories ---- -> Le nombre d'image par catégorie au niveau de profondeur 2 diffère énormément
df=data[['categories_niv1','image']].copy()
dataWCL = data[['categories_niv1','description']].copy()
#export du ficher de data préparé.
data[['categories_niv1','image','description']].to_csv('data_Img_cat_desc.csv',index=False)
df.drop(343,axis=0,inplace=True)
dico=[] #contients toute les images sous formes de tableau "applatit"
target_arr=[] #pour chaque image sa catégorie sous forme de nombre(baby care=0,wactches=6,...)
#Peux prendre vraiment bcp de temps(jusqu'à 18min).... mais ça marche
datadir='./Images/'
#nl,to delete:
list_image=[]
sift = cv2.xfeatures2d.SIFT_create()
orb = cv2.ORB_create()
for img in os.listdir(datadir):
list_image.append(img)
img_array=imread(os.path.join(datadir,img))
#img_resized=resize(img_array,(150,150,3))
gray_img = cv2.cvtColor(img_array, cv2.COLOR_BGR2GRAY)
kp, des = orb.detectAndCompute(gray_img, None)
for d in des:
dico.append(d)
C:\Users\malik\anaconda3\lib\site-packages\PIL\Image.py:2834: DecompressionBombWarning: Image size (93680328 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack. warnings.warn(
Transformations de catégories types catégorielles en type numériques
df.categories_niv1.unique()
array(['Home Furnishing', 'Baby Care', 'Watches', 'Home Decor & Festive Needs', 'Kitchen & Dining', 'Beauty and Personal Care', 'Computers'], dtype=object)
df= df[df['image'].isin(list_image)].copy()
listOfCategories = ['Baby Care','Beauty and Personal Care','Computers','Home Decor & Festive Needs','Home Furnishing','Kitchen & Dining','Watches']
df.replace(['Baby Care','Beauty and Personal Care','Computers','Home Decor & Festive Needs','Home Furnishing','Kitchen & Dining','Watches'],
[0,1,2,3,4,5,6],inplace=True)
#2 photos n'ont pas de categories, on regardes les photos et on décide de mettre la catégories 5
df.fillna(value=5,inplace=True)
df.categorie=df.categories_niv1.astype(int).copy()
<ipython-input-14-998fcfad0181>:9: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access df.categorie=df.categories_niv1.astype(int).copy()
Nous avons à présent un grand nombre de descripteurs par images dans un tableau. Mais il est impossible de tous les utiliser. Il faut donc regrouper en cluster.
k = np.size(listOfCategories) * 10
#nl,nb image*3
batch_size = np.size(os.listdir(datadir)) * 3
kmeans = MiniBatchKMeans(n_clusters=k, batch_size=batch_size, verbose=1).fit(dico)
Init 1/3 with method: k-means++ Inertia for init 1/3: 1044834789.251844 Init 2/3 with method: k-means++ Inertia for init 2/3: 1043343231.234659 Init 3/3 with method: k-means++ Inertia for init 3/3: 1042087132.457279 Minibatch iteration 1/16100: mean batch inertia: 110913.367349, ewa inertia: 110913.367349 Minibatch iteration 2/16100: mean batch inertia: 110372.006269, ewa inertia: 110906.609489 Minibatch iteration 3/16100: mean batch inertia: 110153.263719, ewa inertia: 110897.205404 Minibatch iteration 4/16100: mean batch inertia: 110683.445215, ewa inertia: 110894.537016 Minibatch iteration 5/16100: mean batch inertia: 109568.849088, ewa inertia: 110877.988332 Minibatch iteration 6/16100: mean batch inertia: 109776.100173, ewa inertia: 110864.233360 Minibatch iteration 7/16100: mean batch inertia: 109093.429560, ewa inertia: 110842.128254 Minibatch iteration 8/16100: mean batch inertia: 107733.716531, ewa inertia: 110803.325664 Minibatch iteration 9/16100: mean batch inertia: 108747.181610, ewa inertia: 110777.658628 Minibatch iteration 10/16100: mean batch inertia: 109152.138814, ewa inertia: 110757.367115 Minibatch iteration 11/16100: mean batch inertia: 108529.178729, ewa inertia: 110729.552434 Minibatch iteration 12/16100: mean batch inertia: 108179.662171, ewa inertia: 110697.721919 Minibatch iteration 13/16100: mean batch inertia: 108181.495067, ewa inertia: 110666.311627 Minibatch iteration 14/16100: mean batch inertia: 108018.247772, ewa inertia: 110633.255602 Minibatch iteration 15/16100: mean batch inertia: 108458.568501, ewa inertia: 110606.108783 Minibatch iteration 16/16100: mean batch inertia: 108360.145374, ewa inertia: 110578.072215 Minibatch iteration 17/16100: mean batch inertia: 108089.964859, ewa inertia: 110547.012942 Minibatch iteration 18/16100: mean batch inertia: 108382.660548, ewa inertia: 110519.995131 Minibatch iteration 19/16100: mean batch inertia: 108256.472462, ewa inertia: 110491.739369 Minibatch iteration 20/16100: mean batch inertia: 106666.993356, ewa inertia: 110443.994713 Minibatch iteration 21/16100: mean batch inertia: 107567.317727, ewa inertia: 110408.084889 Minibatch iteration 22/16100: mean batch inertia: 108638.389276, ewa inertia: 110385.993616 Minibatch iteration 23/16100: mean batch inertia: 107909.897174, ewa inertia: 110355.084276 Minibatch iteration 24/16100: mean batch inertia: 107650.559392, ewa inertia: 110321.323443 Minibatch iteration 25/16100: mean batch inertia: 108247.026777, ewa inertia: 110295.429806 Minibatch iteration 26/16100: mean batch inertia: 107608.693223, ewa inertia: 110261.891026 Minibatch iteration 27/16100: mean batch inertia: 108020.993345, ewa inertia: 110233.917694 Minibatch iteration 28/16100: mean batch inertia: 107895.646042, ewa inertia: 110204.728834 Minibatch iteration 29/16100: mean batch inertia: 108722.550297, ewa inertia: 110186.226662 Minibatch iteration 30/16100: mean batch inertia: 106944.883666, ewa inertia: 110145.764679 Minibatch iteration 31/16100: mean batch inertia: 107572.725051, ewa inertia: 110113.645189 Minibatch iteration 32/16100: mean batch inertia: 107173.722245, ewa inertia: 110076.945860 Minibatch iteration 33/16100: mean batch inertia: 107874.586842, ewa inertia: 110049.453609 Minibatch iteration 34/16100: mean batch inertia: 107575.698422, ewa inertia: 110018.573495 Minibatch iteration 35/16100: mean batch inertia: 107454.186594, ewa inertia: 109986.562018 Minibatch iteration 36/16100: mean batch inertia: 106603.685487, ewa inertia: 109944.333258 Minibatch iteration 37/16100: mean batch inertia: 108111.589497, ewa inertia: 109921.454949 Minibatch iteration 38/16100: mean batch inertia: 106711.609243, ewa inertia: 109881.386150 Minibatch iteration 39/16100: mean batch inertia: 107240.926323, ewa inertia: 109848.425046 Minibatch iteration 40/16100: mean batch inertia: 107084.411573, ewa inertia: 109813.921612 Minibatch iteration 41/16100: mean batch inertia: 106596.858463, ewa inertia: 109773.762716 Minibatch iteration 42/16100: mean batch inertia: 108070.288125, ewa inertia: 109752.498086 Minibatch iteration 43/16100: mean batch inertia: 108005.148357, ewa inertia: 109730.685758 Minibatch iteration 44/16100: mean batch inertia: 107908.390393, ewa inertia: 109707.937877 Minibatch iteration 45/16100: mean batch inertia: 106649.107947, ewa inertia: 109669.754222 Minibatch iteration 46/16100: mean batch inertia: 107146.966465, ewa inertia: 109638.262030 Minibatch iteration 47/16100: mean batch inertia: 107088.943773, ewa inertia: 109606.438655 Minibatch iteration 48/16100: mean batch inertia: 107535.813183, ewa inertia: 109580.590847 Minibatch iteration 49/16100: mean batch inertia: 107034.217562, ewa inertia: 109548.804235 Minibatch iteration 50/16100: mean batch inertia: 107858.848478, ewa inertia: 109527.708361 Minibatch iteration 51/16100: mean batch inertia: 107416.426550, ewa inertia: 109501.353036 Minibatch iteration 52/16100: mean batch inertia: 107754.780427, ewa inertia: 109479.550409 Minibatch iteration 53/16100: mean batch inertia: 107118.049225, ewa inertia: 109450.071572 Minibatch iteration 54/16100: mean batch inertia: 107086.307857, ewa inertia: 109420.564492 Minibatch iteration 55/16100: mean batch inertia: 107496.168547, ewa inertia: 109396.542081 Minibatch iteration 56/16100: mean batch inertia: 106586.888061, ewa inertia: 109361.468911 Minibatch iteration 57/16100: mean batch inertia: 107633.376120, ewa inertia: 109339.896969 Minibatch iteration 58/16100: mean batch inertia: 106882.382340, ewa inertia: 109309.219588 Minibatch iteration 59/16100: mean batch inertia: 107096.361542, ewa inertia: 109281.596277 Minibatch iteration 60/16100: mean batch inertia: 106358.482530, ewa inertia: 109245.106779 Minibatch iteration 61/16100: mean batch inertia: 106471.240056, ewa inertia: 109210.480345 Minibatch iteration 62/16100: mean batch inertia: 107291.966093, ewa inertia: 109186.531355 Minibatch iteration 63/16100: mean batch inertia: 107293.851525, ewa inertia: 109162.904858 Minibatch iteration 64/16100: mean batch inertia: 107180.043426, ewa inertia: 109138.152617 Minibatch iteration 65/16100: mean batch inertia: 107372.949396, ewa inertia: 109116.117422 Minibatch iteration 66/16100: mean batch inertia: 106576.840114, ewa inertia: 109084.419390 Minibatch iteration 67/16100: mean batch inertia: 107740.850236, ewa inertia: 109067.647492 Minibatch iteration 68/16100: mean batch inertia: 107716.487321, ewa inertia: 109050.780835 Minibatch iteration 69/16100: mean batch inertia: 107059.715263, ewa inertia: 109025.926180 Minibatch iteration 70/16100: mean batch inertia: 106970.967912, ewa inertia: 109000.273947 Minibatch iteration 71/16100: mean batch inertia: 107056.335132, ewa inertia: 108976.007580 Minibatch iteration 72/16100: mean batch inertia: 106465.605654, ewa inertia: 108944.670001 Minibatch iteration 73/16100: mean batch inertia: 107586.670392, ewa inertia: 108927.717967 Minibatch iteration 74/16100: mean batch inertia: 106027.330122, ewa inertia: 108891.512159 Minibatch iteration 75/16100: mean batch inertia: 107314.938536, ewa inertia: 108871.831645 Minibatch iteration 76/16100: mean batch inertia: 106919.763868, ewa inertia: 108847.463803 Minibatch iteration 77/16100: mean batch inertia: 106517.885964, ewa inertia: 108818.383468 Minibatch iteration 78/16100: mean batch inertia: 106536.452710, ewa inertia: 108789.897917 Minibatch iteration 79/16100: mean batch inertia: 107987.305216, ewa inertia: 108779.879078 Minibatch iteration 80/16100: mean batch inertia: 107378.832307, ewa inertia: 108762.389682 Minibatch iteration 81/16100: mean batch inertia: 106303.037959, ewa inertia: 108731.689368 Minibatch iteration 82/16100: mean batch inertia: 107414.694707, ewa inertia: 108715.249202 Minibatch iteration 83/16100: mean batch inertia: 106525.564608, ewa inertia: 108687.915168 Minibatch iteration 84/16100: mean batch inertia: 106312.743413, ewa inertia: 108658.265680 Minibatch iteration 85/16100: mean batch inertia: 107716.727694, ewa inertia: 108646.512375 Minibatch iteration 86/16100: mean batch inertia: 106282.518878, ewa inertia: 108617.002426 Minibatch iteration 87/16100: mean batch inertia: 107267.019909, ewa inertia: 108600.150470 Minibatch iteration 88/16100: mean batch inertia: 107812.290732, ewa inertia: 108590.315545 Minibatch iteration 89/16100: mean batch inertia: 107699.622176, ewa inertia: 108579.196937 Minibatch iteration 90/16100: mean batch inertia: 107193.137811, ewa inertia: 108561.894633 Minibatch iteration 91/16100: mean batch inertia: 106094.421599, ewa inertia: 108531.092940 Minibatch iteration 92/16100: mean batch inertia: 107622.305631, ewa inertia: 108519.748465 Minibatch iteration 93/16100: mean batch inertia: 106633.071337, ewa inertia: 108496.196900 Minibatch iteration 94/16100: mean batch inertia: 106176.731706, ewa inertia: 108467.242803 Minibatch iteration 95/16100: mean batch inertia: 107210.676714, ewa inertia: 108451.556973 Minibatch iteration 96/16100: mean batch inertia: 107229.166561, ewa inertia: 108436.297760 Minibatch iteration 97/16100: mean batch inertia: 107110.109364, ewa inertia: 108419.742829 Minibatch iteration 98/16100: mean batch inertia: 107004.779689, ewa inertia: 108402.079713 Minibatch iteration 99/16100: mean batch inertia: 106561.480961, ewa inertia: 108379.103350 Minibatch iteration 100/16100: mean batch inertia: 107288.791623, ewa inertia: 108365.492888 Minibatch iteration 101/16100: mean batch inertia: 107720.628737, ewa inertia: 108357.442989 Minibatch iteration 102/16100: mean batch inertia: 107048.903303, ewa inertia: 108341.108368 Minibatch iteration 103/16100: mean batch inertia: 107794.422316, ewa inertia: 108334.284036 Minibatch iteration 104/16100: mean batch inertia: 107160.777291, ewa inertia: 108319.635043 Minibatch iteration 105/16100: mean batch inertia: 106011.084926, ewa inertia: 108290.817199 Minibatch iteration 106/16100: mean batch inertia: 107893.414567, ewa inertia: 108285.856385 Minibatch iteration 107/16100: mean batch inertia: 106503.553988, ewa inertia: 108263.607741 Minibatch iteration 108/16100: mean batch inertia: 106636.356337, ewa inertia: 108243.294611 Minibatch iteration 109/16100: mean batch inertia: 106772.232975, ewa inertia: 108224.931213 Minibatch iteration 110/16100: mean batch inertia: 106573.224227, ewa inertia: 108204.312803 Minibatch iteration 111/16100: mean batch inertia: 106987.097650, ewa inertia: 108189.118194 Minibatch iteration 112/16100: mean batch inertia: 107530.982951, ewa inertia: 108180.902631 Minibatch iteration 113/16100: mean batch inertia: 107412.703887, ewa inertia: 108171.313136 Minibatch iteration 114/16100: mean batch inertia: 107089.029439, ewa inertia: 108157.802889 Minibatch iteration 115/16100: mean batch inertia: 106874.001848, ewa inertia: 108141.777082 Minibatch iteration 116/16100: mean batch inertia: 106935.484007, ewa inertia: 108126.718814 Minibatch iteration 117/16100: mean batch inertia: 105372.985593, ewa inertia: 108092.343709 Minibatch iteration 118/16100: mean batch inertia: 106896.148037, ewa inertia: 108077.411488 Minibatch iteration 119/16100: mean batch inertia: 107443.633554, ewa inertia: 108069.499980 Minibatch iteration 120/16100: mean batch inertia: 107835.087329, ewa inertia: 108066.573785 Minibatch iteration 121/16100: mean batch inertia: 107375.857184, ewa inertia: 108057.951506 Minibatch iteration 122/16100: mean batch inertia: 106869.980358, ewa inertia: 108043.121953 Minibatch iteration 123/16100: mean batch inertia: 106874.930151, ewa inertia: 108028.539307 Minibatch iteration 124/16100: mean batch inertia: 106621.721305, ewa inertia: 108010.977869 Minibatch iteration 125/16100: mean batch inertia: 106269.244434, ewa inertia: 107989.235650 Minibatch iteration 126/16100: mean batch inertia: 107242.837527, ewa inertia: 107979.918293 Minibatch iteration 127/16100: mean batch inertia: 106237.190783, ewa inertia: 107958.163665 Minibatch iteration 128/16100: mean batch inertia: 106795.365715, ewa inertia: 107943.648351 Minibatch iteration 129/16100: mean batch inertia: 106774.445806, ewa inertia: 107929.053088 Minibatch iteration 130/16100: mean batch inertia: 106559.248513, ewa inertia: 107911.953692 Minibatch iteration 131/16100: mean batch inertia: 106262.492252, ewa inertia: 107891.363312 Minibatch iteration 132/16100: mean batch inertia: 106880.531159, ewa inertia: 107878.745002 Minibatch iteration 133/16100: mean batch inertia: 107015.543422, ewa inertia: 107867.969577 Minibatch iteration 134/16100: mean batch inertia: 107912.830892, ewa inertia: 107868.529585 Minibatch iteration 135/16100: mean batch inertia: 106584.283962, ewa inertia: 107852.498228 Minibatch iteration 136/16100: mean batch inertia: 107082.016089, ewa inertia: 107842.880229 Minibatch iteration 137/16100: mean batch inertia: 107119.602122, ewa inertia: 107833.851482 Minibatch iteration 138/16100: mean batch inertia: 107424.251517, ewa inertia: 107828.738408 Minibatch iteration 139/16100: mean batch inertia: 106592.655027, ewa inertia: 107813.308265 Minibatch iteration 140/16100: mean batch inertia: 106223.747526, ewa inertia: 107793.465632 Minibatch iteration 141/16100: mean batch inertia: 107395.277417, ewa inertia: 107788.495012 Minibatch iteration 142/16100: mean batch inertia: 106958.042099, ewa inertia: 107778.128391 Minibatch iteration 143/16100: mean batch inertia: 107972.114371, ewa inertia: 107780.549936 Minibatch iteration 144/16100: mean batch inertia: 107670.762523, ewa inertia: 107779.179450 Minibatch iteration 145/16100: mean batch inertia: 106968.080786, ewa inertia: 107769.054431 Minibatch iteration 146/16100: mean batch inertia: 106727.216383, ewa inertia: 107756.049070 Minibatch iteration 147/16100: mean batch inertia: 107476.620248, ewa inertia: 107752.560935 Minibatch iteration 148/16100: mean batch inertia: 107164.210470, ewa inertia: 107745.216502 Minibatch iteration 149/16100: mean batch inertia: 108709.394601, ewa inertia: 107757.252426 Minibatch iteration 150/16100: mean batch inertia: 106461.417850, ewa inertia: 107741.076403 Minibatch iteration 151/16100: mean batch inertia: 106461.899882, ewa inertia: 107725.108325 Minibatch iteration 152/16100: mean batch inertia: 107549.569862, ewa inertia: 107722.917062 Minibatch iteration 153/16100: mean batch inertia: 106737.030610, ewa inertia: 107710.610151 Minibatch iteration 154/16100: mean batch inertia: 107322.902658, ewa inertia: 107705.770362 Minibatch iteration 155/16100: mean batch inertia: 106231.397881, ewa inertia: 107687.365635 Minibatch iteration 156/16100: mean batch inertia: 106945.169337, ewa inertia: 107678.100730 Minibatch iteration 157/16100: mean batch inertia: 107751.105037, ewa inertia: 107679.012050 Minibatch iteration 158/16100: mean batch inertia: 107516.881094, ewa inertia: 107676.988154 Minibatch iteration 159/16100: mean batch inertia: 107369.247210, ewa inertia: 107673.146595 Minibatch iteration 160/16100: mean batch inertia: 106760.105494, ewa inertia: 107661.749019 Minibatch iteration 161/16100: mean batch inertia: 106881.885883, ewa inertia: 107652.013916 Minibatch iteration 162/16100: mean batch inertia: 107470.146283, ewa inertia: 107649.743646 Minibatch iteration 163/16100: mean batch inertia: 107389.249778, ewa inertia: 107646.491877 Minibatch iteration 164/16100: mean batch inertia: 107188.157369, ewa inertia: 107640.770445 Minibatch iteration 165/16100: mean batch inertia: 107141.706235, ewa inertia: 107634.540580 Minibatch iteration 166/16100: mean batch inertia: 107315.357853, ewa inertia: 107630.556193 Minibatch iteration 167/16100: mean batch inertia: 107024.314650, ewa inertia: 107622.988424 Minibatch iteration 168/16100: mean batch inertia: 107400.506398, ewa inertia: 107620.211160 Minibatch iteration 169/16100: mean batch inertia: 106460.860863, ewa inertia: 107605.738884 Minibatch iteration 170/16100: mean batch inertia: 107189.344950, ewa inertia: 107600.541000 Minibatch iteration 171/16100: mean batch inertia: 107263.055198, ewa inertia: 107596.328134 Minibatch iteration 172/16100: mean batch inertia: 107028.450901, ewa inertia: 107589.239270 Minibatch iteration 173/16100: mean batch inertia: 107032.240633, ewa inertia: 107582.286204 Minibatch iteration 174/16100: mean batch inertia: 107223.186349, ewa inertia: 107577.803528 Minibatch iteration 175/16100: mean batch inertia: 106710.489366, ewa inertia: 107566.976765 Minibatch iteration 176/16100: mean batch inertia: 106829.449335, ewa inertia: 107557.770142 Minibatch iteration 177/16100: mean batch inertia: 106717.267867, ewa inertia: 107547.278075 Minibatch iteration 178/16100: mean batch inertia: 107499.603046, ewa inertia: 107546.682943 Minibatch iteration 179/16100: mean batch inertia: 106851.379148, ewa inertia: 107538.003402 Minibatch iteration 180/16100: mean batch inertia: 106818.073433, ewa inertia: 107529.016450 Minibatch iteration 181/16100: mean batch inertia: 106520.289412, ewa inertia: 107516.424417 Minibatch iteration 182/16100: mean batch inertia: 106754.871558, ewa inertia: 107506.917883 Minibatch iteration 183/16100: mean batch inertia: 107144.610351, ewa inertia: 107502.395165 Minibatch iteration 184/16100: mean batch inertia: 106512.632551, ewa inertia: 107490.039867 Minibatch iteration 185/16100: mean batch inertia: 107921.976262, ewa inertia: 107495.431769 Minibatch iteration 186/16100: mean batch inertia: 106756.404883, ewa inertia: 107486.206428 Minibatch iteration 187/16100: mean batch inertia: 106504.230971, ewa inertia: 107473.948338 Minibatch iteration 188/16100: mean batch inertia: 106701.042609, ewa inertia: 107464.300085 Minibatch iteration 189/16100: mean batch inertia: 106636.095740, ewa inertia: 107453.961533 Minibatch iteration 190/16100: mean batch inertia: 106668.333199, ewa inertia: 107444.154463 Minibatch iteration 191/16100: mean batch inertia: 106732.683060, ewa inertia: 107435.273100 Minibatch iteration 192/16100: mean batch inertia: 107072.200524, ewa inertia: 107430.740831 Minibatch iteration 193/16100: mean batch inertia: 107367.395103, ewa inertia: 107429.950081 Minibatch iteration 194/16100: mean batch inertia: 107202.011031, ewa inertia: 107427.104696 Minibatch iteration 195/16100: mean batch inertia: 105122.243989, ewa inertia: 107398.332908 Minibatch iteration 196/16100: mean batch inertia: 106856.179710, ewa inertia: 107391.565160 Minibatch iteration 197/16100: mean batch inertia: 107009.928206, ewa inertia: 107386.801150 Minibatch iteration 198/16100: mean batch inertia: 106715.351097, ewa inertia: 107378.419378 Minibatch iteration 199/16100: mean batch inertia: 105536.370060, ewa inertia: 107355.424906 Minibatch iteration 200/16100: mean batch inertia: 106727.936925, ewa inertia: 107347.591916 Minibatch iteration 201/16100: mean batch inertia: 107002.406327, ewa inertia: 107343.282933 Minibatch iteration 202/16100: mean batch inertia: 106330.836794, ewa inertia: 107330.644474 Minibatch iteration 203/16100: mean batch inertia: 106261.619101, ewa inertia: 107317.299732 Minibatch iteration 204/16100: mean batch inertia: 106742.834856, ewa inertia: 107310.128634 Minibatch iteration 205/16100: mean batch inertia: 107105.762387, ewa inertia: 107307.577511 Minibatch iteration 206/16100: mean batch inertia: 106788.584827, ewa inertia: 107301.098878 Minibatch iteration 207/16100: mean batch inertia: 106686.880624, ewa inertia: 107293.431535 Minibatch iteration 208/16100: mean batch inertia: 105396.080329, ewa inertia: 107269.746725 Minibatch iteration 209/16100: mean batch inertia: 105522.870331, ewa inertia: 107247.940306 Minibatch iteration 210/16100: mean batch inertia: 108026.090720, ewa inertia: 107257.654029 Minibatch iteration 211/16100: mean batch inertia: 106857.737964, ewa inertia: 107252.661840 Minibatch iteration 212/16100: mean batch inertia: 106170.112944, ewa inertia: 107239.148283 Minibatch iteration 213/16100: mean batch inertia: 105942.474624, ewa inertia: 107222.961786 Minibatch iteration 214/16100: mean batch inertia: 105895.256116, ewa inertia: 107206.387914 Minibatch iteration 215/16100: mean batch inertia: 106559.843067, ewa inertia: 107198.317035 Minibatch iteration 216/16100: mean batch inertia: 107246.422658, ewa inertia: 107198.917542 Minibatch iteration 217/16100: mean batch inertia: 106804.394606, ewa inertia: 107193.992676 Minibatch iteration 218/16100: mean batch inertia: 106486.454114, ewa inertia: 107185.160407 Minibatch iteration 219/16100: mean batch inertia: 104863.942086, ewa inertia: 107156.184425 Minibatch iteration 220/16100: mean batch inertia: 107044.589043, ewa inertia: 107154.791369 Minibatch iteration 221/16100: mean batch inertia: 106282.972867, ewa inertia: 107143.908379 Minibatch iteration 222/16100: mean batch inertia: 106615.700985, ewa inertia: 107137.314717 Minibatch iteration 223/16100: mean batch inertia: 107074.673759, ewa inertia: 107136.532764 Minibatch iteration 224/16100: mean batch inertia: 106866.989377, ewa inertia: 107133.168029 Minibatch iteration 225/16100: mean batch inertia: 107497.812342, ewa inertia: 107137.719918 Minibatch iteration 226/16100: mean batch inertia: 107111.683011, ewa inertia: 107137.394897 Minibatch iteration 227/16100: mean batch inertia: 107102.205610, ewa inertia: 107136.955626 Minibatch iteration 228/16100: mean batch inertia: 107495.415617, ewa inertia: 107141.430315 Minibatch iteration 229/16100: mean batch inertia: 106032.203935, ewa inertia: 107127.583740 Minibatch iteration 230/16100: mean batch inertia: 107032.676650, ewa inertia: 107126.399006 Minibatch iteration 231/16100: mean batch inertia: 106413.784761, ewa inertia: 107117.503376 Minibatch iteration 232/16100: mean batch inertia: 106397.665961, ewa inertia: 107108.517580 Minibatch iteration 233/16100: mean batch inertia: 107380.854236, ewa inertia: 107111.917183 Minibatch iteration 234/16100: mean batch inertia: 107100.862229, ewa inertia: 107111.779183 Minibatch iteration 235/16100: mean batch inertia: 106967.045325, ewa inertia: 107109.972457 Minibatch iteration 236/16100: mean batch inertia: 107010.954320, ewa inertia: 107108.736405 Minibatch iteration 237/16100: mean batch inertia: 106017.746277, ewa inertia: 107095.117474 Minibatch iteration 238/16100: mean batch inertia: 107777.682898, ewa inertia: 107103.638001 Minibatch iteration 239/16100: mean batch inertia: 107184.475980, ewa inertia: 107104.647109 Minibatch iteration 240/16100: mean batch inertia: 106318.199208, ewa inertia: 107094.829808 Minibatch iteration 241/16100: mean batch inertia: 106824.724141, ewa inertia: 107091.458054 Minibatch iteration 242/16100: mean batch inertia: 107149.742563, ewa inertia: 107092.185625 Minibatch iteration 243/16100: mean batch inertia: 106649.016339, ewa inertia: 107086.653502 Minibatch iteration 244/16100: mean batch inertia: 106684.187429, ewa inertia: 107081.629481 Minibatch iteration 245/16100: mean batch inertia: 106346.545726, ewa inertia: 107072.453363 Minibatch iteration 246/16100: mean batch inertia: 106413.960755, ewa inertia: 107064.233339 Minibatch iteration 247/16100: mean batch inertia: 106647.535957, ewa inertia: 107059.031667 Minibatch iteration 248/16100: mean batch inertia: 106040.988980, ewa inertia: 107046.323346 Minibatch iteration 249/16100: mean batch inertia: 107473.159454, ewa inertia: 107051.651581 Minibatch iteration 250/16100: mean batch inertia: 106390.363319, ewa inertia: 107043.396658 Minibatch iteration 251/16100: mean batch inertia: 106970.189615, ewa inertia: 107042.482808 Minibatch iteration 252/16100: mean batch inertia: 106568.826189, ewa inertia: 107036.570109 Minibatch iteration 253/16100: mean batch inertia: 107502.289213, ewa inertia: 107042.383723 Minibatch iteration 254/16100: mean batch inertia: 106342.522778, ewa inertia: 107033.647295 Minibatch iteration 255/16100: mean batch inertia: 106386.191434, ewa inertia: 107025.565044 Minibatch iteration 256/16100: mean batch inertia: 107017.038179, ewa inertia: 107025.458602 Minibatch iteration 257/16100: mean batch inertia: 107105.632737, ewa inertia: 107026.459423 Minibatch iteration 258/16100: mean batch inertia: 106435.177550, ewa inertia: 107019.078397 Minibatch iteration 259/16100: mean batch inertia: 106811.168151, ewa inertia: 107016.483034 Minibatch iteration 260/16100: mean batch inertia: 106694.713264, ewa inertia: 107012.466353 Minibatch iteration 261/16100: mean batch inertia: 106546.824305, ewa inertia: 107006.653700 Minibatch iteration 262/16100: mean batch inertia: 106267.791393, ewa inertia: 106997.430414 Minibatch iteration 263/16100: mean batch inertia: 107540.658508, ewa inertia: 107004.211580 Minibatch iteration 264/16100: mean batch inertia: 107168.320092, ewa inertia: 107006.260162 Minibatch iteration 265/16100: mean batch inertia: 106501.275699, ewa inertia: 106999.956394 Minibatch iteration 266/16100: mean batch inertia: 107473.650138, ewa inertia: 107005.869557 Minibatch iteration 267/16100: mean batch inertia: 106893.237364, ewa inertia: 107004.463559 Minibatch iteration 268/16100: mean batch inertia: 106607.432293, ewa inertia: 106999.507381 Minibatch iteration 269/16100: mean batch inertia: 106788.275486, ewa inertia: 106996.870554 Minibatch iteration 270/16100: mean batch inertia: 106988.084110, ewa inertia: 106996.760872 Minibatch iteration 271/16100: mean batch inertia: 107153.976646, ewa inertia: 106998.723411 Minibatch iteration 272/16100: mean batch inertia: 107281.203503, ewa inertia: 107002.249636 Minibatch iteration 273/16100: mean batch inertia: 107645.820150, ewa inertia: 107010.283386 Minibatch iteration 274/16100: mean batch inertia: 105834.064890, ewa inertia: 106995.600542 Minibatch iteration 275/16100: mean batch inertia: 107477.081802, ewa inertia: 107001.610917 Minibatch iteration 276/16100: mean batch inertia: 107905.483447, ewa inertia: 107012.894041 Minibatch iteration 277/16100: mean batch inertia: 107175.853220, ewa inertia: 107014.928275 Minibatch iteration 278/16100: mean batch inertia: 107141.125637, ewa inertia: 107016.503609 Minibatch iteration 279/16100: mean batch inertia: 107126.612698, ewa inertia: 107017.878111 Minibatch iteration 280/16100: mean batch inertia: 105862.170164, ewa inertia: 107003.451302 Minibatch iteration 281/16100: mean batch inertia: 106488.277986, ewa inertia: 106997.020346 Minibatch iteration 282/16100: mean batch inertia: 106461.821252, ewa inertia: 106990.339406 Minibatch iteration 283/16100: mean batch inertia: 106464.736203, ewa inertia: 106983.778253 Minibatch iteration 284/16100: mean batch inertia: 105446.417702, ewa inertia: 106964.587240 Minibatch iteration 285/16100: mean batch inertia: 106994.129328, ewa inertia: 106964.956016 Minibatch iteration 286/16100: mean batch inertia: 106373.586240, ewa inertia: 106957.573893 Minibatch iteration 287/16100: mean batch inertia: 107256.435014, ewa inertia: 106961.304604 Minibatch iteration 288/16100: mean batch inertia: 107518.317362, ewa inertia: 106968.257845 Minibatch iteration 289/16100: mean batch inertia: 107064.853646, ewa inertia: 106969.463660 Minibatch iteration 290/16100: mean batch inertia: 107043.837999, ewa inertia: 106970.392081 Minibatch iteration 291/16100: mean batch inertia: 106142.188603, ewa inertia: 106960.053541 Minibatch iteration 292/16100: mean batch inertia: 107001.014650, ewa inertia: 106960.564862 Minibatch iteration 293/16100: mean batch inertia: 106359.944611, ewa inertia: 106953.067265 Minibatch iteration 294/16100: mean batch inertia: 105853.227387, ewa inertia: 106939.337862 Minibatch iteration 295/16100: mean batch inertia: 108041.692045, ewa inertia: 106953.098651 Minibatch iteration 296/16100: mean batch inertia: 106966.889643, ewa inertia: 106953.270805 Minibatch iteration 297/16100: mean batch inertia: 107219.564858, ewa inertia: 106956.594978 Minibatch iteration 298/16100: mean batch inertia: 107229.300322, ewa inertia: 106959.999184 Minibatch iteration 299/16100: mean batch inertia: 107371.027034, ewa inertia: 106965.130083 Minibatch iteration 300/16100: mean batch inertia: 107755.053458, ewa inertia: 106974.990769 Minibatch iteration 301/16100: mean batch inertia: 106485.129309, ewa inertia: 106968.875783 Minibatch iteration 302/16100: mean batch inertia: 107398.536248, ewa inertia: 106974.239274 Minibatch iteration 303/16100: mean batch inertia: 106024.745196, ewa inertia: 106962.386652 Minibatch iteration 304/16100: mean batch inertia: 106841.172053, ewa inertia: 106960.873519 Converged (lack of improvement in inertia) at iteration 304/16100 Computing label assignment and total inertia
Nous créons ici pour chaque image, un histogramme.Pour chaque image on aura un vecteur de k valeur. Pour chaque keypoints dans une image, nous trouverons son point centre le plus proche.Enfin, on augmentera de 1 sa valeur afin de normaliser.
kmeans.verbose = False
histo_list = []
for img in os.listdir(datadir):
img_array=imread(os.path.join(datadir,img))
gray_img = cv2.cvtColor(img_array, cv2.COLOR_BGR2GRAY)
kp, des = orb.detectAndCompute(gray_img, None)
histo = np.zeros(k)
nkp = np.size(kp)
for d in des:
idx = kmeans.predict([d])
histo[idx] += 1/nkp # Because we need normalized histograms, we add 1/nkp directly
histo_list.append(histo)
C:\Users\malik\anaconda3\lib\site-packages\PIL\Image.py:2834: DecompressionBombWarning: Image size (93680328 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack. warnings.warn(
plt.hist(histo_list[0])
len(histo_list)
1049
X = np.array(histo_list)
Y = df['categories_niv1']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.33,random_state=77)
mod = XGBRFClassifier()
mod.fit(x_train,y_train)
print(f"The model Xgboost is {accuracy_score(mod.predict(x_test),y_test)*100}% accurate")
C:\Users\malik\anaconda3\lib\site-packages\xgboost\sklearn.py:888: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1]. warnings.warn(label_encoder_deprecation_msg, UserWarning)
[23:56:35] WARNING: ..\src\learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. The model Xgboost is 16.714697406340058% accurate
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import seaborn as sns
def reduction_visualisation(method):
if(method == "PCA"):
mod = PCA(n_components=2)
title = "PCA decomposition" # for the plot
elif(method == "TSNE"):
mod = TSNE(n_components=2)
title = "TSNE decomposition" # for the plot
# Fit and transform the features
principal_components = mod.fit_transform(X)
# Put them into a dataframe
df_features = pd.DataFrame(data=principal_components,
columns=['PC1', 'PC2'])
# Now we have to paste each row's label and its meaning
# Convert labels array to df
df.replace([0,1,2,3,4,5,6],['Baby Care','Beauty and Personal Care','Computers','Home Decor & Festive Needs','Home Furnishing','Kitchen & Dining','Watches'],inplace=True)
df_labels = df['categories_niv1']
df_full = pd.concat([df_features, df_labels], axis=1)
# Plot
plt.figure(figsize=(10,10))
sns.scatterplot(x='PC1',
y='PC2',
hue="categories_niv1",
data=df_full,
palette=["red", "pink", "royalblue", "greenyellow", "lightseagreen","black","purple"],
alpha=.7).set_title(title);
reduction_visualisation('PCA')
reduction_visualisation('TSNE')
nltk.download('punkt')
nltk.download('stopwords')
stemmer = PorterStemmer()
dataset = dataWCL.copy()
data = []
for i in range(dataset.shape[0]):
description = dataset.iloc[i, 1]
# remove non alphabatic characters
description = re.sub('[^A-Za-z]', ' ', description)
# make words lowercase, because Example and example will be considered as two words
description = description.lower()
# tokenising
tokenized_description = wt(description)
# remove stop words and stemming
text_processed = []
for word in tokenized_description:
if word not in set(stopwords.words('english')):
text_processed.append(stemmer.stem(word))
description_text = " ".join(text_processed)
data.append(description_text)
# creating the feature matrix
matrix = CountVectorizer(max_features=1000)
X = matrix.fit_transform(data).toarray()
y = dataset.iloc[:, 0]
# split train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Naive Bayes
classifier = GaussianNB()
classifier.fit(X_train, y_train)
# predict class
y_pred = classifier.predict(X_test)
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
cr = classification_report(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
print("GaussianNB accuracy score is: ",accuracy)
[nltk_data] Downloading package punkt to [nltk_data] C:\Users\malik\AppData\Roaming\nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package stopwords to [nltk_data] C:\Users\malik\AppData\Roaming\nltk_data... [nltk_data] Package stopwords is already up-to-date!
GaussianNB accuracy score is: 0.8479087452471483
#from sklearn.metrics import plot_confusion_matrix
#plot_confusion_matrix(classifier, X_test, y_test)
from sklearn.metrics import ConfusionMatrixDisplay
disp = ConfusionMatrixDisplay(confusion_matrix=cm,display_labels=classifier.classes_)
disp.plot()
plt.show()
# Classification report
print(cr)
precision recall f1-score support Baby Care 0.74 0.64 0.69 36 Beauty and Personal Care 0.74 0.82 0.78 34 Computers 0.89 0.89 0.89 37 Home Decor & Festive Needs 0.91 0.78 0.84 40 Home Furnishing 0.85 0.93 0.89 42 Kitchen & Dining 0.90 0.93 0.91 40 Watches 0.89 0.94 0.91 34 accuracy 0.85 263 macro avg 0.85 0.85 0.84 263 weighted avg 0.85 0.85 0.85 263
reduction_visualisation('PCA')
reduction_visualisation('TSNE')